{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# hvPlot.scatter\n",
    "\n",
    "```{eval-rst}\n",
    ".. currentmodule:: hvplot\n",
    "\n",
    ".. automethod:: hvPlot.scatter\n",
    "```\n",
    "\n",
    "## Backend-specific styling options\n",
    "\n",
    "```{eval-rst}\n",
    ".. backend-styling-options:: scatter\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples\n",
    "\n",
    "Scatter plots are useful for exploring relationships, distributions, and potential correlations between numeric variables."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Basic scatter plot\n",
    "\n",
    "This example shows how to create a simple scatter plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas  # noqa\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame({\"x\": [0, 1, 2, 3], \"y\": [0, 1, 4, 9]})\n",
    "\n",
    "df.hvplot.scatter(x=\"x\", y=\"y\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use a more realistic dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas  # noqa\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "\n",
    "df.hvplot.scatter(\n",
    "    x='bill_length_mm', y='flipper_length_mm',\n",
    "    title='Bill Length vs Flipper Length'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grouping by categories\n",
    "\n",
    "To distinguish categories visually, you can use the `by` parameter. This automatically colors points based on the specified column(s). The generated plot is a [HoloViews NdOverlay](https://holoviews.org/reference/containers/bokeh/NdOverlay.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas  # noqa\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "\n",
    "df.hvplot.scatter(\n",
    "    x='bill_length_mm', y='flipper_length_mm',\n",
    "    by=['sex', 'species'], title='Scatter plot grouped by sex and species with \"by\"',\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "If your goal is to simply color the plot by a given categorical variable, then you can use the [`color`](option-color) option instead of [`by`](option-by). The former will vectorize the color styling (i.e., each marker has its own color) while the latter will generate an overlay of scatter plots. As a consequence, using `color` is much more efficient in this case.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas  # noqa\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "\n",
    "df.hvplot.scatter(\n",
    "    x='bill_length_mm', y='flipper_length_mm',\n",
    "    color='species', title='Scatter plot colored by species with \"color\"',\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(scatter-marker-style)=\n",
    "### Control marker style\n",
    "\n",
    "The marker style can be controlled with the styling option `marker`. For Bokeh plots, the option accepts Bokeh-based markers (see the plot below) and a subset of Matplotlib-compatible markers like `'+'` (note these markers cannot be vectorized). Matplotlib plots accept [Matplotlib](https://matplotlib.org/stable/api/markers_api.html) markers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import bokeh as bk\n",
    "import holoviews as hv\n",
    "import hvplot.pandas  # noqa\n",
    "import itertools\n",
    "import pandas as pd\n",
    "\n",
    "bokeh_orig_markers = list(bk.core.enums.MarkerType)\n",
    "hv_bk_mpl_compat_markers = list(hv.plotting.bokeh.styles.markers)\n",
    "print('Bokeh original markers:')\n",
    "print(*map(repr, bokeh_orig_markers), sep=', ', end='\\n\\n')\n",
    "print('Matplotlib-compatible markers for Bokeh:')\n",
    "print(*map(repr, hv_bk_mpl_compat_markers), sep=', ')\n",
    "\n",
    "df = pd.DataFrame(list(itertools.product(range(6), range(6))), columns=['x', 'y'])\n",
    "df['marker_col'] = bokeh_orig_markers + [''] * (len(df) - len(bokeh_orig_markers))\n",
    "\n",
    "df.hvplot.scatter(\n",
    "    x='x', y='y', marker='marker_col', s=150, title='Bokeh-specific markers'\n",
    ") *\\\n",
    "df.assign(y=df.y+0.2).hvplot.labels(\n",
    "    x='x', y='y', text='marker_col', text_color='black',\n",
    "    text_baseline='bottom', text_font_size='9pt', padding=0.2\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Control color and size\n",
    "\n",
    "You can also vary marker size with the `s` option and color with `c` (or `color`) using numeric columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.pandas  # noqa\n",
    "\n",
    "df = hvplot.sampledata.earthquakes(\"pandas\")\n",
    "\n",
    "df.hvplot.scatter(\n",
    "    x='lon', y='lat', c='mag', s='depth', cmap=\"inferno_r\",\n",
    "    clabel=\"Magnitude values\", title='Earthquake depth (color by magnitude)',\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scatter plot with scaling and logarithmic color mapping\n",
    "\n",
    "This example shows how to fine-tune scatter plots by scaling point sizes and applying a logarithmic color scale. Note we set the `scale` option to uniformally increase the marker size by a factor of 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import hvplot.pandas  # noqa\n",
    "import numpy as np\n",
    "\n",
    "df = pd.DataFrame({\n",
    "    'x': np.random.rand(100) * 10,\n",
    "    'y': np.random.rand(100) * 10,\n",
    "    'size': np.random.rand(100) * 100 + 10,\n",
    "    'intensity': np.random.lognormal(mean=2, sigma=1, size=100)\n",
    "})\n",
    "\n",
    "df.hvplot.scatter(\n",
    "    x='x', y='y', s='size', scale=3,\n",
    "    c='intensity', cmap='Blues', logz=True,\n",
    "    title='Scatter plot with size scaling and log color'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Xarray example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot.xarray  # noqa\n",
    "\n",
    "ds = hvplot.sampledata.air_temperature(\"xarray\").sel(lon=285.,lat=40.)\n",
    "\n",
    "ds.hvplot.scatter(y=\"air\")"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
